Search CORE

Copenhagen University Research Information System

PuSH

University of Southern Denmark Research Output

The South Asian genome

Author: Abbott J
Afaq S
Afzal U
Aitman TJ
Al-Hussaini A
Butcher S
Chambers JC
Elliott P
Elliott P
Elliott P
Gaulton KJ
Geoghegan F
Grewal J
Kooner IK
Kooner JS
Lavery A
Lehne B
Lewin AM
Li X
Li Y
Loh M
McCarthy MI
Miller K
Mills R
Northwood K
O'Reilly P
Oozageer L
Panoulas V
Pearson RD
Scott J
Scott WR
Sehmi J
Tan ST
Turro E
Vandrovcova J
Wander GS
Wang J
Wass MN
Zhang W
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2014
Field of study

Genetics of disease Microarrays Variant genotypes Population genetics Sequence alignment AllelesThe genetic sequence variation of people from the Indian subcontinent who comprise one-quarter of the world's population, is not well described. We carried out whole genome sequencing of 168 South Asians, along with whole-exome sequencing of 147 South Asians to provide deeper characterisation of coding regions. We identify 12,962,155 autosomal sequence variants, including 2,946,861 new SNPs and 312,738 novel indels. This catalogue of SNPs and indels amongst South Asians provides the first comprehensive map of genetic variation in this major human population, and reveals evidence for selective pressures on genes involved in skin biology, metabolism, infection and immunity. Our results will accelerate the search for the genetic variants underlying susceptibility to disorders such as type-2 diabetes and cardiovascular disease which are highly prevalent amongst South Asians.Whole genome sequencing to discover genetic variants underlying type-2 diabetes, coronary heart disease and related phenotypes amongst Indian Asians. Imperial College Healthcare NHS Trust cBRC 2011-13 (JS Kooner [PI], JC Chambers)

Edinburgh Research Explorer

Spiral - Imperial College Digital Repository

Brunel University Research Archive

University of Queensland eSpace

FigShare

Public Library of Science (PLOS)

LSHTM Research Online

Teeside University's Research Repository

eScholarship - University of California

Oxford University Research Archive

Kent Academic Repository

TEAD and YAP regulate the enhancer network of human embryonic pancreatic progenitors.

Author: A Kapoor
A Krapp
A Rada-Iglesias
A Sawada
A-C Binot
AF Hezel
AJ Saldanha
B Zhao
B Zhao
C Cortijo
C Haumaitre
C Trapnell
CHH Cho
CY McLean
DA Stoffers
DW Huang
E Kroon
E Rodríguez-Seguel
EF Chiang
F Argenton
F Esni
F Supek
FC Lynn
FC Pan
H Fang
H Fang
H Lango Allen
I Morán
I Rooman
J Bessa
J van Arensbergen
JM Oliver-Krasinski
K Kawakami
K Piper
K Piper
K Skouloudaki
KJ Gaulton
KM Petzold
KS Zaret
L Elghazi
L Pasquali
M Borowiak
M Carrasco
M Gannon
MA Maestro
MC Whitlock
MF Offield
MJL de Hoon
MN Weedon
MP Creyghton
N Bardeesy
N Gao
NM George
P Jacquemin
PA Seymour
R O’Rahilly
R Xie
RE Jennings
RF Luco
S Gupta
S Heinz
S Xuan
T Derrien
T Jowett
W Zhang
WA Whyte
Y Fujitani
Y Liu-Chittenden
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 13/03/2015
Field of study

The genomic regulatory programmes that underlie human organogenesis are poorly understood. Pancreas development, in particular, has pivotal implications for pancreatic regeneration, cancer and diabetes. We have now characterized the regulatory landscape of embryonic multipotent progenitor cells that give rise to all pancreatic epithelial lineages. Using human embryonic pancreas and embryonic-stem-cell-derived progenitors we identify stage-specific transcripts and associated enhancers, many of which are co-occupied by transcription factors that are essential for pancreas development. We further show that TEAD1, a Hippo signalling effector, is an integral component of the transcription factor combinatorial code of pancreatic progenitor enhancers. TEAD and its coactivator YAP activate key pancreatic signalling mediators and transcription factors, and regulate the expansion of pancreatic progenitors. This work therefore uncovers a central role for TEAD and YAP as signal-responsive regulators of multipotent pancreatic progenitors, and provides a resource for the study of embryonic development of the human pancreas

University of Birmingham Research Portal

CONICET Digital

Spiral - Imperial College Digital Repository

The University of Manchester - Institutional Repository

In silico prioritisation of candidate genes for prokaryotic gene function discovery: an application of phylogenetic profiles

Author: C Médigue
C Perez-Iratxeta
CM Fraser
DM Raskin
EA Adie
EA Adie
EC Lin
EM Marcotte
Enrico Coiera
Frank PY Lin
FS Turner
G Michal
IH Witten
J Freudenberg
J Wu
JP Gogarten
JP Vert
KJ Gaulton
M Kanehisa
M Pellegrini
MY Galperin
N López-Bigas
N Tiffin
PD Karp
R Jothi
Ruiting Lan
S Aerts
Vitali Sintchenko
WJ Kent
Y Yamanishi
Y Zheng
Publication venue: BioMed Central
Publication date: 01/01/2009
Field of study

Background: In silico candidate gene prioritisation (CGP) aids the discovery of gene functions by ranking genes according to an objective relevance score. While several CGP methods have been described for identifying human disease genes, corresponding methods for prokaryotic gene function discovery are lacking. Here we present two prokaryotic CGP methods, based on phylogenetic profiles, to assist with this task. Results: Using gene occurrence patterns in sample genomes, we developed two CGP methods (statistical and inductive CGP) to assist with the discovery of bacterial gene functions. Statistical CGP exploits the differences in gene frequency against phenotypic groups, while inductive CGP applies supervised machine learning to identify gene occurrence pattern across genomes. Three rediscovery experiments were designed to evaluate the CGP frameworks. The first experiment attempted to rediscover peptidoglycan genes with 417 published genome sequences. Both CGP methods achieved best areas under receiver operating characteristic curve (AUC) of 0.911 in Escherichia coli K-12 (EC-K12) and 0.978 Streptococcus agalactiae 2603 (SA-2603) genomes, with an average improvement in precision of >3.2-fold and a maximum of >27-fold using statistical CGP. A median AUC of >0.95 could still be achieved with as few as 10 genome examples in each group of genome examples in the rediscovery of the peptidoglycan metabolism genes. In the second experiment, a maximum of 109-fold improvement in precision was achieved in the rediscovery of anaerobic fermentation genes in EC-K12. The last experiment attempted to rediscover genes from 31 metabolic pathways in SA-2603, where 14 pathways achieved AUC >0.9 and 28 pathways achieved AUC >0.8 with the best inductive CGP algorithms. Conclusion: Our results demonstrate that the two CGP methods can assist with the study of functionally uncategorised genomic regions and discovery of bacterial gene-function relationships. Our rediscovery experiments also provide a set of standard tasks against which future methods may be compared.12 page(s

Macquarie University ResearchOnline

UNSWorks

A Computational Method Based on the Integration of Heterogeneous Networks for Predicting Disease-Gene Associations

Author: A Hamosh
A Rzhetsky
Anguo Dong
BM Schjeide
C Reitz
Chunshui Wei
F Zou
H Liang
H Ogata
Indra Neil Sarkar
JA Prince
K Lage
KI Goh
KJ Gaulton
LA Farrer
LG Biesecker
Lin Gao
M Ashburner
M Oti
M Oti
MA van Driel
O Vanunu
R Anderson
R Jüri
RA Oldenburg
S Kohler
S Peri
TKB Gandhi
X Wu
X Wu
Xiaofei Yang
Xingli Guo
YC Huang
Yi Zhao
Publication venue: Public Library of Science
Publication date: 01/01/2011
Field of study

The identification of disease-causing genes is a fundamental challenge in human health and of great importance in improving medical care, and provides a better understanding of gene functions. Recent computational approaches based on the interactions among human proteins and disease similarities have shown their power in tackling the issue. In this paper, a novel systematic and global method that integrates two heterogeneous networks for prioritizing candidate disease-causing genes is provided, based on the observation that genes causing the same or similar diseases tend to lie close to one another in a network of protein-protein interactions. In this method, the association score function between a query disease and a candidate gene is defined as the weighted sum of all the association scores between similar diseases and neighbouring genes. Moreover, the topological correlation of these two heterogeneous networks can be incorporated into the definition of the score function, and finally an iterative algorithm is designed for this issue. This method was tested with 10-fold cross-validation on all 1,126 diseases that have at least a known causal gene, and it ranked the correct gene as one of the top ten in 622 of all the 1,428 cases, significantly outperforming a state-of-the-art method called PRINCE. The results brought about by this method were applied to study three multi-factorial disorders: breast cancer, Alzheimer disease and diabetes mellitus type 2, and some suggestions of novel causal genes and candidate disease-causing subnetworks were provided for further investigation

CiteSeerX

Public Library of Science (PLOS)

An Open Access Database of Genome-wide Association Results

Author: A Brazma
AD Johnson
Andrew D Johnson
BL Browning
BR Zeeberg
C Dong
Christopher J O'Donnell
CJ Willer
D Curtis
DM Kraus
DV Zaykin
E Hamano
E Zeggini
GK Chen
H Stoiber
J Fellay
KA Frazer
KJ Gaulton
KM Brown
LA Cupples
LL Field
M Fedetz
MD Mailman
QR Liu
R Saxena
R Thibault
S Knapp
SA Mousa
SF Saccone
SJ Chanock
TA Manolio
TG Lesnick
WTCCC consortium
Publication venue: BioMed Central
Publication date: 01/01/2009
Field of study

Abstract Background The number of genome-wide association studies (GWAS) is growing rapidly leading to the discovery and replication of many new disease loci. Combining results from multiple GWAS datasets may potentially strengthen previous conclusions and suggest new disease loci, pathways or pleiotropic genes. However, no database or centralized resource currently exists that contains anywhere near the full scope of GWAS results. Methods We collected available results from 118 GWAS articles into a database of 56,411 significant SNP-phenotype associations and accompanying information, making this database freely available here. In doing so, we met and describe here a number of challenges to creating an open access database of GWAS results. Through preliminary analyses and characterization of available GWAS, we demonstrate the potential to gain new insights by querying a database across GWAS. Results Using a genomic bin-based density analysis to search for highly associated regions of the genome, positive control loci (e.g., MHC loci) were detected with high sensitivity. Likewise, an analysis of highly repeated SNPs across GWAS identified replicated loci (e.g., <it>APOE</it>, <it>LPL</it>). At the same time we identified novel, highly suggestive loci for a variety of traits that did not meet genome-wide significant thresholds in prior analyses, in some cases with strong support from the primary medical genetics literature (<it>SLC16A7, CSMD1, OAS1</it>), suggesting these genes merit further study. Additional adjustment for linkage disequilibrium within most regions with a high density of GWAS associations did not materially alter our findings. Having a centralized database with standardized gene annotation also allowed us to examine the representation of functional gene categories (gene ontologies) containing one or more associations among top GWAS results. Genes relating to cell adhesion functions were highly over-represented among significant associations (p < 4.6 × 10-14), a finding which was not perturbed by a sensitivity analysis. Conclusion We provide access to a full gene-annotated GWAS database which could be used for further querying, analyses or integration with other genomic information. We make a number of general observations. Of reported associated SNPs, 40% lie within the boundaries of a RefSeq gene and 68% are within 60 kb of one, indicating a bias toward gene-centricity in the findings. We found considerable heterogeneity in information available from GWAS suggesting the wider community could benefit from standardization and centralization of results reporting.</p

Springer - Publisher Connector

Design of an allele-specific PCR assay to genotype the rs12255372 SNP in a pilot study of association between common TCF7L2 polymorphisms and type 2 diabetes in Venezuelans

Author: Adeghate E
Assmann TS
Barra GB
Barros CM
Bodhini D
Campbell DD
Cauchi S
Chang YC
Cruz M
Danquah I
Dutra LAS
Duval A
Franco LF
Furgeri DT
Gamboa-Meléndez MA
Gaulton KJ
Grant SF
Groves CJ
Guinan KJ
Hayashi T
Imamura M
Lange EM
Lehman DM
Lozano R
Lyssenko V
Marquezine GF
Martínez-Gómez LE
Ntzani EE
Parra EJ
Peng S
Sepúlveda J
Shu L
Shu L
Sladek R
Stumvoll M
Tong Y
Wagner R
Wang J
Wang J
Yi F
Zhang C
Publication venue: 'FapUNIFESP (SciELO)'
Publication date
Field of study

Clustered ChIP-Seq-defined transcription factor binding sites and histone modifications map distinct classes of regulatory elements

Author: A Barski
A Kanhere
A Marson
A Pekowska
A Rada-Iglesias
A Visel
AP Boyle
B Li
BE Bernstein
BE Bernstein
CM Koch
CZ Zang
D Karolchik
DS Johnson
E Birney
E Lieberman-Aiden
Finn Drabløs
G Hon
G Hon
GA Wray
GE Zentner
H Xu
H Yu
J Ernst
J Kim
JE Phillips
JM Vaquerizas
KJ Gaulton
KJ Won
KJ Won
KL MacQuarrie
L Ooi
LA Pennacchio
M Blanchette
M Bulger
M Gupta
M Guttman
MA Nobrega
MB Rye
MC Tsai
MH Kagey
Morten Rye
MP Creyghton
ND Heintzman
ND Heintzman
O Wallerman
PJ Farnham
PJ Park
PV Kharchenko
PV Kharchenko
Pål Sætrom
Q Zhou
R Jothi
S Cuddapah
S Roy
T Kouzarides
T Li
T Ravasi
TH Kim
TK Kim
Tony Håndstad
TS Mikkelsen
V Gotea
W Niu
X Chen
Y Zhang
Z Wang
Publication venue: BioMed Central
Publication date: 01/01/2011
Field of study

Abstract Background Transcription factor binding to DNA requires both an appropriate binding element and suitably open chromatin, which together help to define regulatory elements within the genome. Current methods of identifying regulatory elements, such as promoters or enhancers, typically rely on sequence conservation, existing gene annotations or specific marks, such as histone modifications and p300 binding methods, each of which has its own biases. Results Herein we show that an approach based on clustering of transcription factor peaks from high-throughput sequencing coupled with chromatin immunoprecipitation (Chip-Seq) can be used to evaluate markers for regulatory elements. We used 67 data sets for 54 unique transcription factors distributed over two cell lines to create regulatory element clusters. By integrating the clusters from our approach with histone modifications and data for open chromatin, we identified general methylation of lysine 4 on histone H3 (H3K4me) as the most specific marker for transcription factor clusters. Clusters mapping to annotated genes showed distinct patterns in cluster composition related to gene expression and histone modifications. Clusters mapping to intergenic regions fall into two groups either directly involved in transcription, including miRNAs and long noncoding RNAs, or facilitating transcription by long-range interactions. The latter clusters were specifically enriched with H3K4me1, but less with acetylation of lysine 27 on histone 3 or p300 binding. Conclusion By integrating genomewide data of transcription factor binding and chromatin structure and using our data-driven approach, we pinpointed the chromatin marks that best explain transcription factor association with different regulatory elements. Our results also indicate that a modest selection of transcription factors may be sufficient to map most regulatory elements in the human genome.</p

Springer - Publisher Connector

NORA - Norwegian Open Research Archives

BICEPP: an example-based statistical text mining method for predicting the binary characteristics of drugs

Author: A Korhonen
A Koussounadis
C Perez-Iratxeta
C Perez-Iratxeta
CB Giles
D Fourches
DR Swanson
EA Adie
EA Adie
EC Fieller
F Hammann
Frank PY Lin
FS Turner
GR Grimes
Guy Tsafnat
H Gurulingappa
J Freudenberg
JA Hanley
KJ Gaulton
L Màrquez
M Hall
M Krallinger
Matthew P Doogue
MF Porter
N López-Bigas
N Tiffin
P Srinivasan
RJ Epstein
S Aerts
S Raychaudhuri
S Raychaudhuri
S Rossi
S Tatar
S Yu
Stephen Anthony
Thomas M Polasek
TM Polasek
V Sintchenko
Y Garten
Publication venue: BioMed Central
Publication date: 01/01/2011
Field of study

Abstract Background The identification of drug characteristics is a clinically important task, but it requires much expert knowledge and consumes substantial resources. We have developed a statistical text-mining approach (BInary Characteristics Extractor and biomedical Properties Predictor: BICEPP) to help experts screen drugs that may have important clinical characteristics of interest. Results BICEPP first retrieves MEDLINE abstracts containing drug names, then selects tokens that best predict the list of drugs which represents the characteristic of interest. Machine learning is then used to classify drugs using a document frequency-based measure. Evaluation experiments were performed to validate BICEPP's performance on 484 characteristics of 857 drugs, identified from the Australian Medicines Handbook (AMH) and the PharmacoKinetic Interaction Screening (PKIS) database. Stratified cross-validations revealed that BICEPP was able to classify drugs into all 20 major therapeutic classes (100%) and 157 (of 197) minor drug classes (80%) with areas under the receiver operating characteristic curve (AUC) > 0.80. Similarly, AUC > 0.80 could be obtained in the classification of 173 (of 238) adverse events (73%), up to 12 (of 15) groups of clinically significant cytochrome P450 enzyme (CYP) inducers or inhibitors (80%), and up to 11 (of 14) groups of narrow therapeutic index drugs (79%). Interestingly, it was observed that the keywords used to describe a drug characteristic were not necessarily the most predictive ones for the classification task. Conclusions BICEPP has sufficient classification power to automatically distinguish a wide range of clinical properties of drugs. This may be used in pharmacovigilance applications to assist with rapid screening of large drug databases to identify important characteristics for further evaluation.</p

Springer - Publisher Connector

Macquarie University ResearchOnline

An alternative effector gene at the type 2 diabetes-associated TCF7L2 locus?

Author: AH Rosengren
AS Dimas
C Fuchsberger
F Yi
G Koscielny
G Silva Xavier da
KJ Gaulton
M Bunt van de
M Claussnitzer
M Mele
Martijn van de Bunt
Q Xia
SF Boj
TA Bowman
V Lyssenko
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study